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Power law or generalized polynomial regressions with unknown real-valued exponents and co- 
efficients, and weakly dependent errors, are considered for observations over time, space or 
space-time. Consistency and asymptotic normality of nonlinear least-squares estimates of the 
parameters are established. The joint limit distribution is singular, but can be used as a basis 
for inference on either exponents or coefficients. We discuss issues of implementation, efficiency, 
potential for improved estimation and possibilities of extension to more general or alternative 
trending models to allow for irregularly spaced data or heteroscedastic errors; though it focusses 
on a particular model to fix ideas, the paper can be viewed as offering machinery useful in 
developing inference for a variety of models in which power law trends are a component. In- 
deed, the paper also makes a contribution that is potentially relevant to many other statistical 
models: Our problem is one of many in which consistency of a vector of parameter estimates 
(which converge at different rates) cannot be established by the usual techniques for coping 
with implicitly-defined extremum estimates, but requires a more delicate treatment; we present 
a generic consistency result. 

Keywords: asymptotic normality; consistency; correlation; generalized polynomial; lattice; 
power law 

1. Introduction 

Polynomial-in-time regression is one of the longest-established tools of time series anal- 
ysis (see Jones [9]). In much empirical work, especially when stochastic trends, such 
as unit roots, are also involved, only a linear trend is countenanced, or merely a con- 
stant intercept. On the other hand, classical methods can test polynomial order when 
observations are equally spaced in time. With independent and identically distributed 
(i.i.d.) normal errors, a particularly elegant way of achieving this, with finite sample 
validity, results from an orthogonal polynomial representation - the covariance matrix 
of the least-squares estimate (LSE) is diagonalized, and contributions to the F statis- 
tic from individual rcgressors are i.i.d. (see Section 3.2.2 of Anderson [1]). Asymptotic 
theory is valid under much wider conditions on the errors; indeed from Section 7.4 of 
Grenander and Rosenblatt [5] , the LSE is asymptotically efficient (in the Gauss-Markov 
sense) when the (possibly non-Gaussian) errors are covariance stationary with spectral 
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density bounded and bounded away from zero at zero frequency, as with short mem- 
ory processes. Polynomial models have also been extended to spatial lattice data (see 
Section 3.4 of Cressie [2]). 

Polynomials are nevertheless restrictive. The Weierstrass theorem justifies their uni- 
form approximation of any continuous function over a compact interval, but seems less 
practically relevant the longer the data set. Nonparametric smoothing may be unreliable 
in a series of moderate length, when instead richer parametric models than polynomi- 
als might be considered. One class that advantageously nests polynomials, which has 
received little theoretical attention, consists of "generalized polynomial" or "power law" 
models. With equally spaced time scries observations y u , u = l,... ,N, consider 



where the 9j and 0j are real valued and all can be unknown, 6j > —1/2 for all j, and 
the zero-mean unobservable process x u is covariance stationary with short memory. For 
9j < —1/2, f3j would not be estimable (whether 9j were known or unknown) because 
the corresponding signal is drowned by the noise. For 6j = —1/2, /3j is estimable but we 
omit this possibility because our central limit theorem requires 8j to lie in the interior 
of a compact set. Polynomials, such as when 6j =j — l for all j, are nested; indeed this 
is a hypothesis that might be tested within (1.1). 

We consider the nonlinear least-squares estimate (NLSE) of the 8j,f3j in (1.1) and, 
more generally, of exponents and coefficients in an extended model defined on a lattice, 
applying to spatial and spatio-temporal data, where our provision, for example, for weaker 
trends than linear ones and for decaying trends seems practically useful. Unlike the LSE 
when exponents are known, the NLSE cannot be expressed in closed form and requires 
numerical optimization. Correspondingly, asymptotic theory, with sample size N increas- 
ing, is needed to justify rules of statistical inference even when errors are Gaussian. We 
establish consistency and asymptotic normality for the NLSE of exponent and coefficient 
estimates, achieving also an analogous efficiency bound to that described above. As with 
other implicitly defined estimates, asymptotic distribution theory makes use (in applica- 
tion of the mean value theorem) of an initial consistency proof. Many such proofs (see 
Jennrich [8], Malinvaud [12]) require regressors to be non-trending, whence under suitable 
additional conditions all parameter estimates are A rl / 2 -consistent. For the NLSE of (1.1), 
Wu [21] significantly relaxed this requirement but nevertheless appears to heavily restrict 
the diversity of trends. The discussion after Assumptions A and A' of Wu [21] indicates 
that they reduce in (1.1) with known 9j to the assumption max 3 6j < | + 2miiXj6j, and 
no weaker requirement suffices in the case of unknown 9j. Example 4 of Wu [21] addressed 
the latter case but with p= 1 only (and for 6\ E (— |,0]) when the inequality is trivially 
satisfied. In general, more elaborate techniques seem required to establish consistency 
in (1.1). Moreover, Wu [21] established consistency with no rate, whereas we find that 
a slow rate of convergence in the 9j estimates is required before asymptotic normality 
is established. Wu [21] also established asymptotic normality of the NLSE in a quite 
general setting, but under the assumption that all parameter estimates converge at the 
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same rate. This is not the case with (1.1); indeed all rates of 9j,f3j estimates turn out 
to differ. For implicitly defined extremum estimates such variation is typically associated 
with difficulty in the initial consistency proof due to the objective function not converg- 
ing uniformly to a function that is uniquely optimized over the whole parameter space. 
Consistency proofs here have tended to be geared to the case at hand (see e.g. Giraitis, 
Hidalgo and Robinson [4], Nagaraj and Fuller [13], Nielsen [14], Robinson [15], Sun and 
Phillips [17]). Our consistency proof employs a generic result (presented and proved in 
Appendix A to avoid interrupting the flow) that seems likely to apply to a quite general 
class of estimates (not just the NLSE) of a variety of models. Our asymptotic distribution 
theory of estimates for (1.1) and its extension presents some other unusual features. 

The following section presents the model, regularity conditions and three theorems 
describing asymptotic statistical properties. The main details of their proofs appear in 
Appendix B. These use a series of propositions, stated and proved in Appendix C, and 
relying in turn also on a series of lemmas, in Appendix D. A Monte Carlo study of 
finite sample performance appears in Section 3, while Section 4 discusses aspects of the 
theoretical results and their implementation, with possible extensions. 

2. Estimation of spatial lattice regression model 

Let the integer d > 1 represent the dimension on which data are observed, where d — 1 
for time series (as in (1.1)) and d>2 for spatial or spatio-temporal data. Generalize u 
to the d-dimensional multi-index u = (u±, it2, ■ • ■ , u d )' ■ Denoting Z + = {j: j = 0, 1, . . .}, 
generalize (1.1) to 

d Pi 

y«=££i8y«f y +x u = f(u;6)'l3 + x u , ueZ d + , (2.1) 
i=i j=i 

where x u is described subsequently and /3 = (j3[, . . . , /3' d )', Pi = (Pa, ■ ■ ■ ,Pi Pi )', 6 = (@[, 
■■■,9'd)', i = (6 il ,...,e ip J, f(u;9) = (f 1 (u 1 ;e 1 y,...,f d (u d ;6 d yy, f i {u i ;6 i ) = {uf\..., 

Uj' Pl )', for i = 1, . . . , d. Defining p = pi + ■ ■ ■ + p d , the pxl vectors (3 and 9 are supposed 
unknown. Any fi(uf,9i) might be absent from f(u;9) when corresponding 9i and f3i are 
void; we proceed as if corresponding pi and sums over j = 1, . . . ,pi are zero, avoiding 
indicator functions to describe such circumstances. 

Our consistency proof confines the NLSE of 9 to a compact set. Prescribe an (arbitrarily 
small) positive 5, and for each i = l,...,d, prescribe Aj, Aj such that — l/2<Ai<Aj< 
oo, and define 

Gi = {h lt . . . , h Pi : h!> Ai;hj - h 5 - X >S,j = 2,.. .,pf, h Pi < AJ (2.2) 
and O = Yli—i Oj- We introduce two assumptions that imply identifiability of 9 and /3. 
Assumption 1. 9£®. 

Assumption 2. 9ij = for at most one (i, j); ^ for all 
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Assumption 1 implies 

-l/2<6 a <---<6 iPi <oo, i = l,...,d. (2.3) 

The ordering in (2.3) is arbitrary, and distinctness of the 6ij across j along with the 
first part of Assumption 2 identifies /3; note that = 1 for all i and that we allow an 
intercept but do not require one. The second part of Assumption 2 identifies 8. 

Given N = J\i=i n i observations on y u , u e N = Ni x ■ ■ • x Nrf and Ni = (1, . . . , Ui), 
define the NLSE of f3, 9 by 0, 0) = aigmh\ beWL v the& Q(b, h), where Q(b, h) = J2 u eniv^ _ 
b' f(u; h)} 2 . Asymptotic theory requires further assumptions. Let Z ={j: j = 0,±1, . . .}. 

Assumption 3. x u , u £ Z d , is covariance stationary with zero mean, and its auto- 
covariance function, "f u = cov(xt,Xt+ u ), for the multi-index t= (t±, . . . , td)' , satisfies 
E« G Z d l7u| <oo. 

Our parameter estimates make no attempt to correct for this possible nonparametric 
weak dependence of the x u (permitted also in Assumption 5), and Cressie [2], page 25, 
stresses the importance of mean function specification relative to error specification. 
However, the NLSE turns out to be not only consistency-robust to spatial correlation 
but also asymptotically Gauss-Markov efficient. 

The next assumption, of increase with algebraic rate of observations in all dimensions, 
is capable of generalization but is employed for simplicity. 

Assumption 4- ni ~ P>iN bi , i = l,...,d, as N —> oo, where Bi > 0, hi > 0, i = l,...,d, 

Define Qj — biOij and, with no loss of generality, identify dimension i = 1 such that 

Cn = min tta}, (2.4) 

l<i<d 

where, if two or more i satisfy (2.4), an arbitrary choice is made. Note that Cn + h > 
is implied by #n + \ > 0. 

Theorem 1. Let Assumptions 1-4 hold. Then for j = 1, . . . ,pi, i = 1, . . . , d, as N — > oo, 

Oa-Ot^O^N*-^- 1 ! 2 ) (2.5) 

for any \ > 0. 

The proof is in Appendix B. As is common with initial consistency proofs, a sharp rate 
(corresponding to x = in (2.5)) is not delivered (smoothness conditions, in particular, 
are not exploited). Theorem 1 is used in the proof of our central limit theorem (CLT), 
for which we also need consistency, with a rate, for (3. We state this result without the 
proof, which is a relatively straightforward application of Theorem 1, techniques used in 
its proof, Theorem 3 below and routine manipulations. 
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Theorem 2. Let Assumptions 1-4 hold. Then, for j = 1, . . . ,pi, i = 1, . . . ,d, 
ki = Pij + OpCOogJVjJV*-^- 1 / 2 ), as oo. 

The relative rates for the Oij and /3ij in Theorems 1 and 2 are matched by relative 
rates that feature in our CLT. For this we introduce first 

Assumption 5. x u = ^2 ve %d £ v £u-v, X^ez d < °°> u ^ > where v is the multi-index 
v = (yi, . . . ,Vd)' , {s u ,u £ Z d } are independent random variables with zero mean and unit 
variance, {s^,u £ If 1 } are uniformly integrable and X^ez d ^ ^. 

Assumption 5 implies Assumption 3, and both imply the existence and boundedncss 
of the spectral density F(X) = (27t) _1 | J2 v ez d £t> e ' 1 ' A | 2 or " where A is the multi-index 
A = (Ai, . . . , Ad)', while Assumption 5 also implies F(0) > 0. Stationary invertible au- 
torcgressive moving averages arc among time series processes covered by Assumption 5, 
as are spatial generalizations of these (see e.g. Hallin, Lu and Tran [6], Robinson and 
Vidal Sanz [16], Tj0stheim [18, 19], Yao and Brockwcll [24]). Mixing conditions, such as 
ones employed in a spatial context by Gao, Lu and Tj0stheim [3], Hallin, Lu and Yu [7], 
and Lu, Lundervold, Tj0stheim and Yao [11], provide an alternative route for establishing 
a CLT, but are not strictly weaker or stronger than Assumption 5, which we prefer here 
because x u , unlike processes considered in the latter references, is involved only linearly 

Let I r be the r-rowed identity matrix, ® denote the Kronecker product, and 
introduce p x p matrices D — N 1 / 2 diag{n^ 11 , . . . , n^ 1 , . . . , n e d dl , . . . , n d dPd }, L(s) = 
diag{ii(,si), . . . , Ld(sd)}, where Li(si) = (logSj)/ Pj , and (2p x 2p) matrices D + = I 2 ®D 
and L+ = diag{/ p , L(n)}. Define a = (9',f3')', a = (O'J')'. Denote by %.(a,A) an r- 
dimensional normal vector with mean vector a and (possibly singular) covariance ma- 
trix A. Appendix B defines the p x p matrix T and p x 2p matrix B and proves: 

Theorem 3. Let Assumptions 1, 2 and 5 hold. Then as N —too, 
D+L^(a - a) 9T 2p (0, 2nF(0)B'T- 1 B). 

3. Finite sample properties 

A small Monte Carlo study provides some information on finite sample performance. 
Issues of concern, given unknown 9, are bias and variability of the NLSE and accuracy of 
large sample inference rules suggested by Theorem 3. We employed (2.1) with d = 2,p\ = 
p-2 = 1, picking 2 (^1,6*2) = (#11, #21) combinations - (1, 1), (0.5,2) - but throughout took 
0<i = [—0.45, 4], (5i = f3n = 1, i = 1,2. We varied N absolutely and also the relative ni,n2, 
taking m,n 2 = (8, 12), (10, 10), (11, 20), (15, 15). 

Our first experiment took the x u to be i.i.d. 9Ti(0, 1) variables. Tables 1 and 2 report, 
for the respective parameter combinations, bias (BIAS), mean squared error (MSE), and 
empirical size at 5% (SIZE5) and 1% (SIZE1) for the NLSE § u f3 u and also ft, the 
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Table 1. 9i 


= 1, 2 = 1, fa 


= 1,02 = 


1, a 2 = 1, x u 


i.i.d. 








n\ 712 




01 


o 2 


fa 


fa 


fa 


fa 


8 12 


BIAS 


0.008 


0.007 


0.024 


0.000 


0.017 


0.000 




MSE 


0.016 


0.007 


0.080 


0.001 


0.051 


0.000 




SIZE5 


0.100 


0.125 


0.151 


0.048 


0.166 


0.055 




SIZE1 


0.044 


0.048 


0.075 


0.010 


0.084 


0.010 


10 10 


BIAS 


0.005 


0.009 


0.016 


-0.001 


0.009 


0.002 




MSE 


0.010 


0.009 


0.060 


0.006 


0.063 


0.007 




SIZE5 


0.132 


0.132 


0.180 


0.053 


0.186 


0.051 




SIZE1 


0.055 


0.050 


0.084 


0.015 


0.090 


0.011 


11 20 


BIAS 


-0.002 


0.002 


0.016 


0.000 


-0.007 


0.000 




MSE 


0.003 


0.001 


0.022 


0.000 


0.010 


0.000 




SIZE5 


0.086 


0.104 


0.115 


0.039 


0.120 


0.051 




SIZE1 


0.030 


0.039 


0.051 


0.005 


0.049 


0.012 


15 15 


BIAS 


0.003 


0.002 


0.006 


0.000 


-0.001 


0.000 




MSE 


0.002 


0.002 


0.013 


0.000 


0.013 


0.000 




SIZE5 


0.074 


0.075 


0.108 


0.043 


0.103 


0.039 




SIZE1 


0.024 


0.022 


0.033 


0.010 


0.037 


0.010 



LSE of Pi that correctly assumes 9, for i = 1,2, across 1000 replications. The sizes were 
proportions of significant estimates, using normal critical values scaled by estimated 
standard deviations which, in the case of the 9i, Pi, were computed on the basis of 
Theorem 3 with current parameter estimates replacing true values of 6*,/3, and 27tF(0) 
replaced by the sum of squared residuals divided by N (so the spatial independence of 
the x u was treated as known, as it was also in the conventional scaling used for the Pi). 

The tables reveal a definite inferiority of the NLSE relative to the LSE, but unsur- 
prisingly, as the LSE is exactly unbiased, more efficient and yields exact critical regions. 
Though the NLSE-based tests on /3 are nearly always over-sized, this phenomenon di- 
minishes with increased N, and overall the discrepancy between the performances of the 
two classes of the P estimate does not seem very serious. There is also a predominate 
over-sizing of the tests on 9, but again this falls as N increases, and, in Table 2 in par- 
ticular, it is often modest. There is a tendency for the NLSE to over-estimate, but for P 
biases only exceed 2% of the parameter value when m — 8 and n, = 12. For 9 they never 
reach 1%, while overall they mostly fall with increasing N, as does the MSE. In Table 2, 
the results are not in line with what the rates in Theorem 3 suggest, because the fall in 
MSE is greater for # 2 and $2 than for 9\ and Pi , despite the fact that 9\ = 2 and 9 2 = \ ■ 
Nevertheless, it is not clear to what extent one would expect asymptotic theory to predict 
comparisons at this level of refinement in such sample sizes. Note that the Monte Carlo 
results are also difficult to judge relative to the theory because the various n 2 ; did not 
result from fixing the hi and Bi and then increasing n, but were chosen with a view to 
representing some variability in n, and relative to n\ and n 2 . In addition, the conver- 
gence rates of 9i and Pi do not only depend on n^, but on the overall n. Other results are 
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"1 


"2 




01 


02 


ft 


ft 


ft 


ft 


8 


12 


BIAS 


0.008 


0.001 


0.024 


0.003 


-0.002 


-0.000 






MSE 


0.014 


0.001 


0.071 


0.005 


0.001 


0.000 






SIZE5 


0.063 


0.060 


0.087 


0.077 


0.053 


0.090 






SIZE1 


0.028 


0.012 


0.038 


0.029 


0.014 


0.034 


10 


10 


BIAS 


0.008 


0.000 


0.020 


0.004 


0.000 


-0.000 






MSE 


0.013 


0.003 


0.074 


0.004 


0.001 


0.000 






SIZE5 


0.069 


0.057 


0.101 


0.058 


0.065 


0.039 






SIZE1 


0.033 


0.013 


0.047 


0.015 


0.017 


0.009 


11 


20 


BIAS 


0.005 


-0.000 


-0.001 


-0.002 


0.000 


0.000 






MSE 


0.005 


0.000 


0.028 


0.002 


0.000 


0.000 






SIZE5 


0.052 


0.054 


0.069 


0.030 


0.059 


0.041 






SIZE1 


0.017 


0.012 


0.017 


0.012 


0.011 


0.006 


15 


15 


BIAS 


0.002 


0.001 


0.004 


0.001 


0.004 


0.000 






MSE 


0.004 


0.001 


0.025 


0.001 


0.000 


0.000 






SIZE5 


0.058 


0.044 


0.070 


0.081 


0.043 


0.055 






SIZE1 


0.018 


0.011 


0.019 


0.019 


0.010 


0.020 



more closely in line with the asymptotic theory. This is the case in Table 1 where, with 
61 = 62 = 1, the above MSE ratios are sometimes greater for 62 and/or ft and sometimes 
less. It is also the case in Table 2 for the LSE ft, as elsewhere, that comparisons are 
sometimes difficult as a number of MSEs are zero to 3, and even to 4 (unreported here), 
decimal places. 

Next we considered the effect of dependence, employing three different models for x u , 
again with d=2. All models entailed weak dependence, with varying spans, but in the 
first dependence was negative, so that the spectral density at zero was small, whereas in 
the other two it was positive, producing a peaked spectral density. In the following, e u ~ 
i.i.d. OIi (0,1). 

1. Multiple direction MA(1): 

1 1 

x u =£« - 0.12 e Ul+ j tU2+ k, Ui = l,...,rii,i = l,2. (3.1) 

j=-ik=-i 
Cj',k)*o 

2. Multilateral MA(4), no interactions: 

4 

X u S u -\- *S ^ {^ui +j,U2 ,U2+j ) j l^i — l,...,^i,2 — 1;2 (3.2) 

j=-4 



for 01 = 0.14, a 2 = 0.12, a 3 = 0.1, a 4 = 0.08. 
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m 


n-2 




0i 


0a 


ft 


ft 


P2 


ft 


8 


12 


BIAS 


0.005 


0.003 


0.006 


0.000 


0.002 


-0.000 






MSE 


0.006 


0.003 


0.031 


0.000 


0.021 


0.000 


10 


10 


BIAS 


0.005 


0.001 


0.001 


0.000 


0.008 


-0.000 






MSE 


0.003 


0.003 


0.023 


0.000 


0.023 


0.000 


11 


20 


BIAS 


0.001 


0.0001 


0.001 


-0.000 


0.001 


0.000 






MSE 


0.001 


0.000 


0.006 


0.000 


0.003 


0.000 


15 


15 


BIAS 


0.002 


-0.001 


-0.003 


-0.000 


0.005 


0.000 






MSE 


0.000 


0.000 


0.004 


0.000 


0.004 


0.000 



3. Bilateral MA(9), on diagonal: 

9 

x u = e u + y] (0.95) |j| e Ml+ j >M2+ j, m = 1, . . . ,rii,i = 1,2. (3.3) 

J=-9 

For the same parameter values as before, bias and MSE of the LSE and NLSE are 
presented in Tables 3-8, with Tables 3 and 4 referring to (3.1), Tables 5 and 6 to (3.2), 
and Tables 7 and 8 to (3.3). As before the LSE $1,02 are exactly unbiased, as the Monte 
Carlo results tend to illustrate. However, perhaps surprisingly, the dependent model (3.3) 
produces some very large biases in the NLSE fix, though not so much in $2,61, 9%. For 
the other dependence models the NLSE biases are not necessarily greater than under 
independence. The MSE magnitudes are not directly comparable to those of Tables 1 
and 2, because scales were not calibrated, but a similar overall picture emerges: the NLSE 
of /3 often has much greater MSE than the LSE, but this falls with increasing N, as does 
that of the NLSE of 0. In Tables 4, 6 and 8, where 6% =2, 62 = 3, the same somewhat 



surprising 


feature as 


noted in Table 2 appears, 


with 6\ 


and $1 : 


improving less than 62 


Table 4. 6 


'i = 2, 62 = 


1/2, ft = 1, ft 


= 1, a 2 — 1, x 


u = (3.1) 








ni U2 




0i 




Pi 


Pi 


P2 


fa 


8 12 


BIAS 


0.003 


0.000 


0.003 


-0.000 


-0.000 


0.000 




MSE 


0.003 


0.000 


0.017 


0.001 


0.000 


0.000 


10 10 


BIAS 


-0.003 


0.000 


0.014 


-0.001 


-0.001 


0.000 




MSE 


0.003 


0.000 


0.018 


0.001 


0.000 


0.000 


11 20 


BIAS 


-0.000 


0.000 


0.003 


0.000 


-0.000 


-0.000 




MSE 


0.001 


0.000 


0.004 


0.000 


0.000 


0.000 


15 15 


BIAS 


-0.001 


0.000 


0.005 


0.001 


-0.000 


-0.000 




MSE 


0.000 


0.000 


0.004 


0.000 


0.000 


0.000 
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"1 


"2 




6>i 


02 


Pi 


Pi 


ft 


P2 


8 


12 


BIAS 


0.032 


0.020 


0.053 


-0.001 


0.035 


0.000 






MSE 


0.050 


0.026 


0.249 


0.004 


0.169 


0.002 


10 


10 


BIAS 


0.029 


0.020 


0.017 


-0.005 


0.047 


0.003 






MSE 


0.031 


0.031 


0.181 


0.003 


0.177 


0.003 


11 


20 


BIAS 


0.010 


0.003 


0.017 


-0.001 


0.015 


0.001 






MSE 


0.013 


0.004 


0.091 


0.001 


0.045 


0.000 


15 


15 


BIAS 


0.008 


0.007 


0.006 


-0.001 


0.014 


0.000 






MSE 


0.007 


0.008 


0.059 


0.000 


0.060 


0.001 



and $2 with increasing n, and the only additional point to add to our previous discussion 
is that convergence is often expected to be slowed by dependence. 

4. Final comments 

1. For known 9, long-established techniques (see [1], Section 2.6) give D(f3(9) — (3) — j-j 
9t p (0, 27tF(0)$ _1 ) (where $ is defined near the start of Appendix B below), so ignorance 
of 9 incurs not only efficiency loss, but slightly slower convergence. Theorem 3 also implies 
a singularity in the limit distribution, whose covariance matrix has rank p only. This is 
due to bias in /?, which on expansion is seen to have a term linear in 9 — 9 that dominates 
the contribution from ^2 u£K f(u;9)x u . Nevertheless, Theorem 3 does provide separate 
inference on /3 (moreover, one can conduct joint inference that does not cover both 9ij 
and pij for any though, given Assumption 1, we cannot test zero restrictions on /3. 

In our setting, f3 may be of less initial interest than 9, and Theorem 3 allows inference 
on 9 with 9 converging slightly faster than 0, and at what appears to be the optimal rate 
for this problem. 



Table 6. 0i 


= 2, 02 = 1/2, 


01=1, ft 


= 1, a 2 = 


1, x u = (3.2) 








ni n2 




01 


02 


pi 


Pi 


P2 


h 


8 12 


BIAS 


0.064 


0.001 


0.048 


0.005 


-0.001 


-0.000 




MSE 


0.115 


0.000 


0.272 


0.024 


0.003 


0.000 


10 10 


BIAS 


0.067 


-0.001 


0.023 


-0.002 


0.005 


0.000 




MSE 


0.111 


0.001 


0.267 


0.019 


0.005 


0.000 


11 20 


BIAS 


0.019 


0.000 


0.035 


0.000 


-0.001 


0.000 




MSE 


0.027 


0.000 


0.151 


0.009 


0.000 


0.000 


15 15 


BIAS 


0.008 


0.000 


0.046 


-0.002 


-0.001 


0.000 




MSE 


0.020 


0.000 


0.143 


0.007 


0.001 


0.000 
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m 


»2 




0i 


02 


Pi 


Pi 


P2 


P2 


8 


12 


BIAS 


0.074 


0.096 


0.154 


0.008 


0.091 


-0.004 






MSE 


0.129 


0.157 


0.738 


0.048 


0.549 


0.024 


10 


10 


BIAS 


0.041 


0.069 


0.105 


-0.008 


0.050 


0.008 






MSE 


0.080 


0.097 


0.455 


0.033 


0.371 


0.032 


11 


20 


BIAS 


0.016 


0.036 


0.134 


0.0010 


0.017 


-0.000 






MSE 


0.043 


0.032 


0.462 


0.014 


0.232 


0.005 


15 


15 


BIAS 


0.013 


0.024 


0.061 


-0.003 


0.028 


0.002 






MSE 


0.026 


0.026 


0.214 


0.009 


0.182 


0.009 



2. If independence of the x u is not assumed, the limiting covariance matrix in Theo- 
rem 3 can be consistently estimated (under additional conditions) by replacing F(0) by 
a parametric or smoothed nonparametric estimate based on NLSE residuals. 

3. The form of the limiting covariance matrix in Theorem 3, with dependence simply 
reflected in the scale factor 27tF(0), suggests that a generalized NLSE, which corrects 
parametrically or nonparametrically for correlation in x u , affords no efficiency improve- 
ment (cf. Section 7.4 of Grenander and Rosenblatt [5]). 

4. On the other hand, our estimates are not Fisher efficient for non-Gaussian x u . De- 
partures from Gaussianity might be detected by, for example, nonparametric probability 
density estimation based on NLSE residuals; Hallin, Lu and Tran [6] studied density 
estimation for linear lattice processes. More efficient parameter estimates could be ob- 
tained by M-estimation using a correctly parameterized e u distribution, or adapting 
scmi-parametrically to a nonparametric one, in either case employing parametric {£„} or 
approximating them via a long autoregression. The extra proof details would be far from 
trivial, but convergence rates should be unaffected, with the limiting covariance matrix 
of Theorem 3 simply shrunk by a scalar factor. 



Table 8. 9i 


= 2, 02 = 1/2, 


Pl=l, P2 


= l,a 2 = 


1, x u = (3.3) 








Ui 72,2 




0i 


02 


Pi 


Pi 


P2 


fa 


8 12 


BIAS 


0.063 


-0.000 


0.100 


0.014 


0.009 


-0.000 




MSE 


0.518 


0.003 


1.217 


0.291 


0.019 


0.000 


10 10 


BIAS 


0.098 


-0.000 


0.118 


0.009 


0.008 


-0.000 




MSE 


0.512 


0.003 


0.912 


0.222 


0.016 


0.000 


11 20 


BIAS 


-0.037 


-0.002 


-0.007 


-0.001 


0.008 


0.000 




MSE 


0.275 


0.000 


1.059 


0.128 


0.004 


0.000 


15 15 


BIAS 


0.054 


0.000 


0.128 


-0.001 


0.001 


0.000 




MSE 


0.226 


0.000 


0.616 


0.086 


0.003 


0.000 
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5. Another extension allows long or negative memory, in x u , bearing in mind results of 
Yajima [22] for (1.1) with known integer 8i, and Yajima and Matsuda [23]; this would af- 
fect all convergence rates by the same scalar factor, the efficiency property in Comment 3 
would be lost, and negative 9ij, and corresponding /3y may not be estimable. 

6. In an alternative formulation to (1.1), u 0J is replaced by (u/N) 8j , confining the 
regression to the unit interval, and (2.1) can be analogously modified. Consistency is 
then much easier to prove, all exponent estimates being v^/V-consistent. A similar device 
is employed in fixed-design nonparametric regression, but unlike there it is not essential 
in order to achieve consistency in our parametric setting, where we find it aesthetically 
unattractive given that x u is defined on an increasing domain. 

7. The results are straightforwardly extended to allow some 9ij in (2.1) to be known; 
for example, to specify an intercept by On = 0, though the norming factor and limit 
covariance matrix in Theorem 3 are affected. 

8. Our notation suggests constant spacing between observations across all d dimen- 
sions, but allowing the interval of observation to vary with dimension affects each fyj by 
a factor depending also on the corresponding but not the Oij themselves. 

9. Irregular spacing of observations, either due to missing data from an otherwise reg- 
ular lattice, or with observations occurring anywhere on R d , can also be considered. In 
both of these settings asymptotic theory requires a degree of regularity in the observation 
locations, ruling out situations where observations become too sparse, for example. Given 
this, the extension is relatively simple with independent x u . Under dependence, asymp- 
totic variance formulae will be complicated by the irregular spacing and the efficiency 
property of Comment 3 will be lost. In addition, different kinds of assumptions from ours 
on the errors x u may be needed. In the case of missing data from an otherwise regular 
lattice, our Assumptions 3 (for consistency) and 5 (for asymptotic normality) should 
still suffice. But for observations anywhere on R d it would be appropriate to consider 
an underlying continuous process. Then, for consistency, a suitable ergodicity property 
would be needed, whereas for asymptotic normality leading possibilities that can entail 
weak dependence analogous to that of Assumption 5 include suitable linear functionals 
of Brownian motion and mixing conditions. 

10. A Bayesian treatment would be worthwhile, with suitable priors placed on the 
exponents and possibly also the coefficients. 

11. When d > 2 a more realistic model than (2.1) might allow interaction terms, that 
is, products of powers of Ui and Uk, i^k. Our proof methods are extendable, but from 
a practical perspective the curse of dimensionality threatens and the issue of parsimonious 
specification, already posed by (2.1), becomes more pressing. A penalized procedure could 
be used. 

12. Modified model classes might provide an alternative route to parsimony; for ex- 
ample, one might take pi = 1 with finu^ 1 replaced by /3u(un + (j>n) 6il for known or 
unknown <f>n (cf. Example 3 of Wu [21]). Trigonometric factors might also be incorpo- 
rated (cf. Section 7.5 of Grenander and Rosenblatt [5]). 

13. For alternative classes of trending model (for example, involving wavelets), asymp- 
totic estimation theory might be handled by similar techniques. 

14. An alternative practically relevant modelling of the x u treats them as heteroscedas- 
tic but possibly independent. Broadly similar proof techniques would provide corre- 
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sponding results to ours, but the NLSE is less efficient than a suitably weighted esti- 
mate. 

15. Though we have focussed on (1.1) and (2.1) to fix ideas, our methods and the- 
ory can be developed to cover models that incorporate power law trends along with 
other explanatory variables, both stochastic and non-stochastic, such as extensions of 
the nonparamctric and semiparamctric spatial regressions considered by Gao, Lu and 
Tj0stheim [3] and Lu, Lundervold, Tj0stheim and Yao [11], and so the paper can be 
viewed as introducing machinery relevant to a wide variety of settings. 



Appendix A: Generic consistency theorem 

We present a consistency theorem for a general, implicitly defined extremum estimate 
under unprimitive conditions that will be checked in the paper's setting and seem capable 
of checking in a number of others. As this appendix is self-contained, there seems no risk 
of confusion in employing notations that are similar to those elsewhere in the paper 
but can have slightly different meanings. We estimate the px 1 vector parameter 0, 
with elements 6i, i = 1, . . . ,p, by 6 = argmin/j e e R(h), where R(h) :K P — > K depends on 
sample size N and 6 C R p is a fixed compact set. For positive scalars Ci W , i = l,...,p, 
w = 1,2, depending on N and such that Ci W < Cj^+i, i = l,...,p, define C w = 
(C\ w , . . . , Cpw)' i and 

p 

M t {C lw ) = {hi-. \hi - 0i\ < C iw }, N(C W ) = nM(C™), 

M{c w ) = e\N(c w ), s w =M{C W ) nM(c w+1 ). 

Theorem A. Assume: 

(i) <zM{Cw+i) for a finite integer W and N sufficiently large; 

(ii) There exist positive si, . . . , s-w and U(h), V(h) such that R(h) = R(0) + U(h) + 
V(h) and Si < ■ ■ ■ < sw , and as N — > oo, Si — > oo and 

1, somer/>0, (A. 2) 

o P (l). (A.3) 
Then 

§ = e + O p (C 1 ), as N — ^ oo, 
where O p (Ci) is a p x 1 vector with ith element O p (Cn). 

Proof. We show that P{9 £ A' r (Ci)) — > as N — > oo. By a standard kind of argument 

P(0eA r (Ci)) <p( inf {R(h)-R(0)}<6). 

\heAT(Ci) ' 



PI inf 

Mes w s w 



sup 

hes u 



\V(h)\ 
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Under (i), Af(d) C Af(C 1 )nN(C w+1 ) = |J^=i S ™- Tllus the last probability is bounded 

by 

f p( inf (Ml < o) < f p( sup M > inf m\ t 
which is bounded by 

which tends to zero on applying (A. 2) and (A. 3). □ 

Three comments are relevant. (1) In the setting of the rest of the paper, U can be 
chosen nonstochastic but this is not possible in the context of such stochastic trends as 
unit roots, where the more general (A. 2) is useful. (2) An almost sure convergence ver- 
sion of Theorem A is possible under suitably strengthened versions of (A. 2) and (A. 3). 
(3) By comparison with our decomposition of AT(Ci) into Si,... ,S\y, van de Geer [20] 
(see pages 69, 70) employed a "peeling device" to obtain an exponential inequality for 
sup ge g{\Z]^(g)\/T(g)}, where Z^(g) is a stochastic process, r(g) is a non-negative func- 
tion and the set Q is "peeled off" as U - =1 Gj, where Gj = {g € G- m,j-i < r(g) < nij}, for 
an increasing sequence {irij}, and J need not be finite. Thus sup ge g j {\ZN(g)\/r(g)} < 
{sup ggg T (g)<m j \Zn{9)\} /mj-i an d only the supremum of the numerator of the original 
statistic need be approximated. There is no denominator there like r(g) in our prob- 
lem, and our decomposition of N(C\) is designed to suitably balance U(h) and V(h) on 
each S w to enable choices of the s w that make all W summands in (A.4) small. 

Appendix B: Definitions and proofs of theorems 

To define T, introduce first, for i — l,...,d, the Pi x 1 vector 4>i(gi) with jth element 
{(jij + 1) _1 and the pi x pi matrix $i(<?i, hi) with (j, k)th element (gij + hik + 1) _1 for gi = 
(gn,...,9i Pi y, hi = (ha, . . .,h ipi )', where g ih hij > -1/2 for all For g = (g[, . ..,g' d )', 
h = (/i' l7 . . . , h' d )', introduce the p x p matrix $(17, h) with (i,j)t\x pi x pj block $,((?,:, hi) 
when i = j and ^>i{g%)(l>j (hj)' when i ^ j. Denote $ = $(9,9). Writing <f>i = 4>i(9i), = 
$>i(9i,9i), define pxp matrices $ + , $ ++ with (i,i)th pi x pj block $.j o $ i5 2$.; o $j o 
$i when i = j and 4>i((f>j ° <pj)' , (<pi o 4>i)((j>j o (j)j)' when i ^ j, where "o" denotes the 
Hadamard product. Put T = $ ++ - Define B = (fi^ 1 ,— I p ), where /3a is the 

pxp diagonal matrix such that /3a lp = P and lp is the p x 1 vector of l's. 

Proof of Theorem 1. We have 9 = argmin/jge R(h), P = P(p), where 

R(h)=Q0(h),h), j3(h) = M(h,h)~ 1 {M(h,6)P + m(h)} 

for M(g,h) = Y^ ue ®f(u;g)f(u;h)', m(h) = J2 u <en f( u ' h ) x ^- The subsequent proof im- 
plies that after suitable norming M(h,h) is well conditioned for relevant h and large TV. 
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In Theorem A, take U(h) = f3'D»S>(h)Dp, V{h) = V x {h) - {V 2 (h) - V 2 (9)} - {V 3 (h) - 
V 3 (9)}, for V 1 (h) = P'{P{h)-DV(h)D}P, V 2 (h) = 2m(hyM(h,h)- 1 M(h,9)(3, V 3 (h) = 
mihyM^h^mih), with V(h) = &(0,0)-$(0,h)$(h,h)- 1 $(h,0), P(h) = M(9,9) - 
M(6,h)M(h,h)- 1 M(h,6). Define, for j = l,. ..,Pi, i=l,...,d, and a finite W 7 positive 
scalars Cij W , w = 1, ... , W, such that Cij W < Cij )W +i for each such w. Define 

C w = (Cii™, • • • , Ci plW , . . . , Cdiw, ■ ■ ■ > Cdpdw), w = 1, . . . ,W + 1. (B.l) 

Define neighbourhoods Nij(C ijw ) = {h.f. \h VJ - 9^\ < C l]w }, j = 1, . . . ,Pi, i = l,...,d, 
w = 1, . . . , W + 1. Finally, define for w = 1, . . . , W + 1, 

d Pi 

^(C w ) = ]Jl[K j (C ijw ), (B.2) 

i=ii=i 

and then A/C^), <S W as in (A.l). Take Cyi = N*'^- 1 / 2 ~ A™" 1 / 2 ^' , j = 1, . . . , 
Pi, i = 1, . . . , d, so we need to show that P(9 € A/"(Ci)) — ► as AT — > oo. We check (i) 
and (ii) of Theorem A, where (A. 2) reduces to the requirement inf heS w U(h)/s w > r\ for 
large enough N and r\ as in (A. 2). From (B.l) and (B.2), 

S w c<dnT w , 

where 

d Pi 

T w = |J []{hif Ihj -9ij\ > C ijw ;h M : h H e (-1/2, oo), all (fc,Z) ^ 
i=i j=i 

It follows from Proposition 1 that 

i=l j = l 1 i=l j=l 

Thus (A. 2) is satisfied when 

Xi>'^ (%, (B.3) 

Next, (A. 3) is implied if 

sup \V 1 (h)\ = o(s w ), (B.4) 
sup \V 2 (h)-V 2 (6)\=o p (s w ), (B.5) 
sup \V 3 (h)\=o p (s w ), (B.6) 

as A — > oo. Note that in (B.5) we are considering the difference V 2 (h) — V 2 (9) for h 
suitably close to and this closeness is important in obtaining the desired result, whereas 
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in the usual kind of consistency proof, for standard, non-mixed rate settings, one more 
simply shows the convergence to zero in probability of a suitably normalized V2(h), 
uniformly in ft £ O. Now (B.6) follows from Proposition 4, while (B.4) and (B.5) follow 
from Propositions 2 and 3, respectively, if 

J2J:N^-s'cI w+1 = o( Sw ), (B.7) 
i=l J=l 

where 5* = min[mini<i<d{&i/2 + min(6,Aj, 0)}, 2\] (implying 6* > 0) and 

d Pi 

jy^ N i/z+^+eC ijtW+1 = o{ Sw ) (B.8) 

i=l j=l 

for some e > 0. 

It remains to show that we can choose W and the s w , Cij W , to satisfy (i) of Theorem A 
and (B.3), (B.7) and (B.8). Now (B.3) holds for w = 1 if s x = iV 2 *, and for w > 1 if 

s w = Sl N^-^ s '/ 2 = iV 2 x+(— D 5 */2 j 
C«„ = CwN^-W = , 1 1 d. 

Since 

N 1+2 ^Cf n = Sl , N 1+2 ^- s *Cf jtW+1 = Sl NW 2 -V s * = s w N- 5 '/ 2 
for all (B.7) is satisfied. For all 

N l/ 2+ai+eCi ^ +i = NX+ e +w &*/A = SwNe - x+ sy4 + (l- w )5*/4 = Q ^ 

on taking e < x — <5*/4, to satisfy (B.8). Finally, for all i,j, though CV,i — > as N — > oo 
(no matter how small S* or how large Qj ) , we have Cij W — >■ oo as N oo for large 
enough iu, so there is a finite W to satisfy (i) of Theorem A. □ 

Proof of Theorem 2. Omitted. □ 

Proof of Theorem 3. Put a = (ft', &')', Q(a) = Q(h, b) and define QW (a) = (d/da)Q(a), 
Q i2) (a) = {d/da')Q {1 \a). We have 

L+QW(a) = -2j2{Vu-b'f(u;h)}H(u;h,b), 

where H(u; ft, b) = [(L(u)f(u; ft) o b)', (Lf(u; h))'}' with L= L{n) and L+Q { - 2 \a)L + = 
Eli Q?\o), with 

q( 2) («) = 2 ^ if (u; ft, b)H(u; ft, b)', 

new 

Q< 2) (a) = 2 ^{&7(u; ft) - p'f(u; 6)}J(u; ft, 6), 
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Q ( i\a) = -2j2^uJ(u;h,b), 



in which J(u;h,b) is the 2p x 2p symmetric matrix with (i,j)th p x p block L(u)f&(u; 
h)L(u)bA for i = j = 1, L(u)f&(u; h)L for £ = 1, j = 2 and for i = j = 2, 6a, /a(w, ft.) 
being the p x p diagonal matrices such that b = 6a lp, /(u; h) = /a(w; /i)lp- 
By the mean value theorem 

D+L^(a -a) = (D^L+Q™ D+ X L+Q (1) (a), (B.9) 

where Q^ 2 ' is formed from Q^ 2 \a) by evaluating its ith row at a = ctu\, where || —a\\ < 
\\a — a\\, i = 1, . . ., 2p. By Proposition 5 (B.9) is 

{D^L+Q^i^L+D- 1 + OpilogNy^D-'L+Q^ia). 

Let Ba = diag(/3^ , —I p ) and T be the 2p x 2p matrix with p x p blocks Tn = 0, T21 = 
T' 12 = L" 1 A, r 22 = -L~ l K- A'L- 1 , with A = $" 1 $ + T- 1 . Noting Proposition 6 and the 
representations 

BD^L+Q^(a) = 2N- 1 ' 2 ^{L( U ) - L}D~ X f{u; 9)x u , 

MSN 

B A TB A Dl x L + Q^\a) = -2N~ l/2 ^ [{Pa~^')'> {L^K{L{u) - L} - A')']' 

x D^ 1 f(u\9)x Ul 

we obtain from (B.9) 

D + L+\a -a) = -N~ 1/2 B ^ [T^L^) - L} + A']Z) _1 /( w ; 

- iV- 1 / 2 ^[0, (i- x A{i( U ) - LjD-'fiu; d)x u )'}' 

ueN 

+ p ((iogiV)- 2 )7V- 1 / 2 ^((/?AiH) , ,i)' J D- 1 /(";eK. 

The last two terms are O p ((log A^) -1 ) by application of Lemmas 15 and 10, respectively. 
The proof is completed by applying Proposition 7 to the first term. □ 

Appendix C: Propositions 

Proposition 1. For all C w given by (B.l) such that Cij W > 0, j = 1, . . . ,pi, i = 1, . . . ,d, 
there exists rf > such that, for all £9 

d Pi 

inf U^^^NYY fcnf^C 2 
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Proof. Non-singularity of ft) for ft, £ 0, and 

suppCM)- 1 !^*:, (ci) 

e 

where if throughout denotes a finite, positive generic constant, follow from Lemmas 2 
and 3, numerators of elements of the inverse being bounded and denominators bounded 
away from zero. Now ^(h) = [(I p , 0)S(/i) _1 (J p , 0)'] _1 , where the 2p x 2p matrix S(ft) has 
(i,i)th pxp submatrix $(01(i = 1) + hl(i = 2),61(i = 1) + hl(i = 2)), l(-) denoting the 
indicator function and S(ft) -1 existing on JV(C W ) as implied below. Introduce the 2p x 2p 
orthogonal permutation matrix II defined by II(l2 <8> a) = ({1' 2 <S) a[), . . . , (1' 2 <E) a' d ))' ', for 
any p x 1 vector a with ith pi x 1 subvector a*. Then UE.(h)H' has the form of T in 
Lemma 2 or 3. 

In the Lemma 2 situation, where no Oij is zero and no hij is zero on Af(C w ), we 
have n = 2pi, r = 2p, and v tk =6 tk , k = 1, . . . ,k, Uifc = 6»^ fc _ Pi , k=Pi + l,..., 2p t . De- 
noting Ei(h) = diag{6»ii - ha, . . .,9 lPl - h Wi } and ej(ft) = Amg{Ei(h), -Ei(h)}, e(ft) = 
diag{ei(ft), . . . , ed{h)}, inspection of the results of Lemma 2 indicates that we may write 
(nS(/i)n') _1 = e(ft) _1 Ge(ft) _1 , where the pxp matrix G is non-singular and bounded 
on Af(C w ). Then 

where E(h) = diag{^i(/i), . . . , E d (h)}, G = (7 p ,0)n'Gn(/ p ,0)'. Thus [/(ft) = (3'DE(h) x 
G~ 1 E(h)D[3 > (3'D 2 E(h) 2 p/tr(G), whence the result follows by boundedness of G and 

The details in the Lemma 3 setting, in which either 8%j = for one (i, j), or hij can be 
zero on Af(C w ) for one are too similar to warrant inclusion. □ 



Proposition 2. 



d Pi 

sup \V 1 (h)\<Kj2J2N^- s 'C' w . (C.2) 

Proof. Define D(/i) = iVdiag^ 11 , . . . , n* 1 "' , . . . , n h d d \ . . . , n d dPd }, so D = D(9), and 
M( 5) ft) = D(g)- 1 M(g_, h)D(h)- 1 , also Fi(ft) = M(6>, 6) - M(9, ft) - M(h, 0) + M(ft, ft), 
F 2 (h) = {M(6,h) - M{h,h)}M(h,h)- 1 {M{h,0) - M(h,h)}, so we have the identity 
D^Pi^D- 1 = Fiih) ~ F 2 (h). Likewise, *(ft) = #i(ft) - * 2 (ft), where 

*i(ft) = $(0, (9) - $(0, ft) - $(ft, 0) + $(ft, ft), 

# 2 (ft) = /i) - ft) _1 { $ (^> 0) - $0> h)}. 

Thus Vi(ft) = 7ii(ft)-Fi 2 (ft), where V u (h) = p'D{Fi(h)-^i{h)}D/3, i = 1,2. Now V H (ft) 
is bounded by 



i=i 3 -=i i=\ 



Pi 



1 ™; ,1 

— / J Vij{ui/ni)vtf(ui/ni) - / %• (x)u^ (x) da: 
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Pi d p k 



+ kn E E E n i ' 3 n * ke \ u 



i=i j=i k=i i=\ 



where «y (x) = v(x\ Qij , hij) with v defined just before Lemma 6, Vij = Vij (x) dx, Vij 
2«*=i Vij{ui/ni). From Lemma 8 the first modulus is bounded by 



-1 v^™ 

n. 



K\hij - 8 l0 \\h u - 6 u \(\ogn l f/n\ +m ' m(2 ^ ) < KN r 

because logn, < log A^, n i+ min ( 2 ^>°) = (• B .j V 6 i )i+min(2A i ,o) > n 25 * /K. The second mod- 
ulus is bounded by 

\vij - Vij\\v k i\ + \vki - vu\\vij\. (C.3) 

Now 



f 1 n " ] 1/2 ( f 1 ] 1/2 

< < — Y v ki{uk/n k ) 2 > , \vij\ < < / Vij (x f dx [ 

[ nk u k =i J Ivo J 



so from Lemmas 6, 7 and 8, (C.3) is bounded by 

f (lognQ 2 (logrtfc) 2 \, a 1 1 1 /j I 

K i l+mm(A„0) + l+mi„(A fc ,0) f I h H ~ ~ M* 

and the expression in braces is bounded by N~ d . Thus by elementary inequalities, 
sup hgA/ - (Ctt:) |Pi»(/i)| has the bound (C.2). Next, F 2 {h) - ^ 2 (h) is 

{M(9, h) - M(h, h) - $(0, ft) + ft)}M(ft, h^iMQi, 6) - M(h, h)} 

+ {^(6, h) - $0, h)}{M(h, h)- 1 - ft)" 1 }{M(/i, 0) - M(h, h)} (C.4) 
+ {$((9, h) - $0, /i)}$(/i, ft) -1 {M(ft, 9) - M{h, h) - $0, 6) + $(h, h)}. 

The final factor times D(3 has a norm bounded by 



d Pi Pi ( n k -i 

i=l j=l £=1 I 2 u fe =l 1/0 



Vij(x)x hii dx 



d Pi d p k ( n fc ! 

^ 1/2 EE EE"H ^ E («*M) fcM -«« / ^"da 



i=i j=i k=i e=i k Mfc=i 
The first term in braces is 



(C.5) 



— V] v I — ; Oij + ha, + ha ) - / u(ai; % + h u , h l3 + ha) dx. 
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By Lemma 8, this is bounded by 

K\0ij + ha ~ hj - h u \N- 5 ' < K\6 VJ - h l3 \N~ s * . 

After rearrangement as before, and application also of Lemma 8, the second term in 
braces in (C.5) has the same bound. Thus (C.5) is bounded over h£Af(C w ) by K x 
J2i=i Y^jLi iV 1 / 2 " 1 "^ -5 * Cij w . On the other hand, using Lemma 6, 

d rii 

\\PD{*(B, h) - h)}\\ < E N 1/2+c ^C ijw (C.6) 

i=l 3=1 

uniformly in ft G M{C W ). Using (C.l), the contribution to V2i(h) has the bound in (C.2). 
To deal with the contributions from the other two terms in (C.4), standard manipulations 
indicate that it suffices to show that 

d rii 

sup \\{M(h,6)-M(h,h)}Dl3\\<KY / ^NV 2 +^ +s *C ijw , (C.7) 
sup \\M(h, h) - $(h, h)\\ < KN~ 2S * . (C.8) 

Since the elements of M(h,9) — M(h,h) are of form n~ x ^2 u£ ^(ui / n.i) hij Vki{uk /rik) , for 
i = k or i 7^ k, (C.7) follows much as before, using Lemmas 4 and 7. Finally, (C.8) is an 
easy consequence of Lemma 5. □ 

Proposition 3. For any e > 

d m 

sup \V 2 {h)-V 2 (9)\<Kyy j N 1 l 2 +^C l]W . (C.9) 
heN{C w ) . =lj . =1 

Proof. We can write V2(/i) — V^(^) as 

2{m(/i) - m(0)}'/3 + 2m{h)'M{h, /i)~ 1 {A/(/i, 0) - M(ft, /i)}/3 

= 2{m(/i) - fh{6)}'Dp (CIO) 
+ 2m(h)'M(h, /i) _1 {M(/i, 0) - M(ft, ft)}£>/3, 

where m(/i) = D{K)~ l m(K). Now 



s{sup||m(ft)|i} <# (C.ll) 



immediately from Lemma 10. From the proof of Proposition 2, the last term of (C.10) 
thus has the bound (C.9). Next, 

d rii 

\{rh(h)-rh(d)yDp\<Kj2J2 NCzJ $>^KMK 

i=l 3=1 MEN 

and by Lemma 11 its supremum over Af(C w ) has the bound (C.9). □ 
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Proposition 4. 

snp\V 3 (h)\<K. 

Proof. Writing V 3 (h) = fh(h)' M(h,h)- l m(h), the result follows from (C.l), (C.8) 
and (C.ll). □ 

Proposition 5. As N — > oo, 

D^LiQW - Q( 2 )(a)}LD; 1 = O p ((log AO" 2 ). 
Proof. By elementary inequalities, the result follows if 

D^L{Q^(a) ~ Q^(a)}LD+ 1 = O p ((logiV)- 2 ) 

for any a such that ||a — a|| < ||a — a|| . A typical element of Q\ (a) — Q\ (a) is 

2^(log W4 ) Pl (log^) 1 ~ Pl (log^) p2 (logn fc ) 1 - p2 
" eN . (C.12) 

for i= k andi ^ k, and p 1 ,p 2 = 0, 1. We need to show that (C.12) = Op(iV 1+ ^ +Cfcf /(logiV) 2 ). 
With Pij , Pki replaced by , fike , it is bounded by 

if(logiV) 2 — ^ |«f« +e «-ttf« +flM |, » = fc, (C.13) 

ni ui=i 



or by 



K(logiV) 2 £ £ -u?«uH (C.14) 



Ui — 1 Ui — 1 

Note that, for example, 

Yl u ^ = °p n i W sup 



"i=i 



ft i ) ^ ^ ft i 7 



: P (< y ), 



since nf 3 = O p (nf 3 exp(7VX-fe-i/2 i og _/v)) = O p (nf J ), taking x < Cn + |- Then from 
Theorem 2 and Lemma 12, (C.13) is 

O p ((logiV) 3 7V 1+ ^ +c "(iV x -^- 1/2 + N x - ( "- 1/2 )), 

which is O p (N 1+ ^ + ^ u / (logiV) 2 ) as desired, while using Lemma 4, (C.14) is 

O p ((logN) 3 N 1+ ^ +€ke (N x ~^- 1/2 + N x - Cm - 1/2 )), 

which is O p (N 1+l ' ij+ ^ kl! /(log N) 2 ) as desired. Using Theorem 3, it is readily seen 
that (C.12) = O p (iV 1 +^ + ^V(log^) 2 ). 
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(2) (2) 

The only elements of Q\ '(a) — Q\ (a) that are not identically zero are the diagonal 
elements corresponding to the three non-null submatrices in J(u;h,b), and are of form 

2^{^7( W ;0)-/37("^)}uf !J {(logu j; ) 2 A,} P (log^logr l . i ) 1 - p (C.15) 
ueN 

for /7 = 0,1. We have to show this is O p (A 1+2< > iJ ' /(logTV) 2 ). After replacing /3y by fy, it 
is bounded by 



or 



d Pk 
k=l 1=1 tiGN 

Proceeding much as before, this is 

(d p k \ 
(log N) 2 £ 2V 1+C "+^ •N*-^- 1 ' 2 
k=n=i ) 

= O p ((logJ\0 2 iV 1/2+Cy+x ) = O p (A^ 1+2 ^/(log^) 2 ) 

Again, the same bound holds for (C.15). 

Fir 
of form 



(2) (2) 

Finally, Q\ ' (a) - Q y 3 '(a) has non-zero elements at the same locations, and they are 



-2(\ogn t ) 1 - p J2 x um j \ogu t yu e t >> - {pijlogmYu^} (C.16) 

for p = 0, 1, which again will be shown to be O p (N 1+2 ^ /(log TV) 2 ). Replacing fcj by fa 
gives 

-2/3 ij (logn i ) 1 ~ p j nf !J ^ a; u (logUi) p u(w i /n i ; %) 
+ _l)^ Xu (log« i )" u f w } 

tiGN ' 

= O p ((logjV) 2 2V 1 / 2 +^+*) = O p (7V 1+2 ^/(log^) 2 ), 

applying Lemmas 10 and 11, and rr y_ew — 1 = O p ((logiV)|0y- — %|). We can show, as 
before, that (C.16) has the same bound. □ 

Proposition 6. As N oo, 

D+L- 1 Q^(a)- 1 L- 1 D+ = ^BT^B' + ±B A TB A + O p ((log A^ 2 ). 

(2) (2) 

Proof. Clearly Q y 2 '(a)=0. A typical non-zero element of Q3 (a) is 
-2^{(log Ul ) 2 A J } p (log^logn l ) 1 " p u- IJ a;„ 

u6N 
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for p = 0, 1, and from Lemma 10 this is 

O p ((log NfN 1 ' 2 ^) = O p (N 1+2 ^/(\ogN) 2 ) 

as desired. From Lemmas 13 and 14, 

D^Q^D^ 1 = 2 diag{/3 A , I p }(A + 0((\ogN)~ 2 )) diag{/3 A , I P }, 

where A has p x p submatrices Aij such that An = L&L — L$ + — <&' + L + $++, A12 — 
A' 21 = L$L-<i>' + L, A 22 =L$L.Thus A" 1 has p X p submatrices A ij such that A 11 =T -1 , 
A 12 = A 21 ' = A'i- 1 - T" 1 , A 22 = L- 1 *- 1 ^" 1 + ($L - $+)T- 1 (i* - 
It follows that A^ 1 = (I p ,—I p )T~ 1 (Ip 7 —I p y + r. The proof is straightforwardly con- 
cluded. □ 

Proposition 7. As N — > 00, 

^-1/2 £ [T-^L^) - L} + A'].D-7( U ; 0)z„ ^ d %(0, 2tt j F(0)T- 1 ). 



Proof. Write x u = x ul + x u2 for x ul = J2veE M £v£u-v, x u2 = J2 V £E M £,v£u-v, with E M = 
{u: \m\ < M,i = 1, . . .,<£}, Em = Z d \ Em, for positive integer M. For t] > 0, choose M 
such that Y, V £E M < Writing 



= [r- 1 {L(u)-L} + A'}D-\f(u;8), 



we have 



and 



ti6N veE M w£E M u,u—v+w&H 

<(E ^"'Ewi 2 ' 



^ E imi 2 < § E ED 1 + {iog(«i/«i)} 3 ](«i/«i) My < ^ 



ti6N 



116N i=l j'=l 



(C.17) 



by Lemmas 13 and 14. Then (C.17) < Krj 2 . Next write N^ 1 / 2 £) uGN ftjXui 
N ~ 1,2 HweE' £ w x Y,u&e» Zu-r»9u, where 

£' = {w: 1 - M < Wi < m + M,i = 1, . . . , d}, 
E" = {u: max(l, Wi — M) < u,; < min(rii, Wi + M), i = 1, . . . , d}. 
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We may then apply a CLT, with N and thus N* = Y[i=i( n i + 2M) increasing, for 
independent random variables whose squares are uniformly intcgrablc. It remains to 
check two aspects. The first is the Lindeberg condition, 



1 

— max 

N wEE' 



^ ^ £,u—wQu 



uEE 1 ' 



->• 0, as N -> oo. 



The left side is bounded by 



K K d Pi 

-max|j gil || 2 < -^^max[{log( l i J /n l )} 2 + l](«*/ni) 29y ->• 0, 

t=l 3=1 

since, for some r/>0, 

(Ui/m) 2 ^ < lfa > 0) + < 0) < iV 1 "", | log(«i/ni)| < KlogN. 

The second aspect is the covariance structure: 

e!n~ 1 ^Y^9uXi u\{n- 1 ^Y,3uXi u\ =N- 1 J2Y,^Y,'9u9u+ w -v, (C.18) 

where the primed sum is over all u such that m , u + u> — w £ N . Since M is fixed and | j g u | | < 
KN 1 -^, for some 77 > 0, (C.18) differs by o(l), as N -> 00, from TV" 1 £„ iU)6jEju x 

E ue N5«ff«+tu-«- Using Lemma 16, this differs by o(l) from N- 1 (J2 v ee h £v) 2 J2ueN9ug' u , 
which, by Lemmas 13 and 14 and straightforward calculation and elimination, equals 



v£E M 



E &J {T" 1 $++T- 1 -T- 1 $' + A-A'$ + T- 1 +A'$A + 0(l/log7V)} 

E eOV- 1 +o(i/io g iv)}^f e ^V* -1 



vEE m vEE m 

as iV — s- 00, and the last displayed expression differs by 0(n) from ^f) 2 ^ -1 = 

27tF(0)T- 1 . □ 

Appendix D: Technical lemmas 

Lemma 1. Let T be an r x r matrix, with (i,j)th r% x rj block Tij , i,j = l,...,d, where 

Y^i—i r i = r ■ Let ti be a column vector such that = ttfj , i 7^ j , and Ta — tit^ is positive 
definite, i,j = 1, . . . ,d. Then T is non-singular, with {i,j)th ri x rj submatrix 
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and 

—T~ 1 tt'T~ 1 

( i-W-V (1+ "- "** (D - 2) 

where n = t^T^U, a = £\ =1 n/ (1 - n). 

Proof. Let T be the rxr matrix with diagonal blocks T, = T„ — tit[ 1 and zeros elsewhere, 
so T = f + tt', where t = (t[, . . . , t' d )' . Now because det{Tjj - = det{T;J(l - n), it 
follows that r, < 1, and 



Tu^^iln + ^-r^-Hit 1 ^ 1 }, i = l,...,d. (D.3) 

Then T _1 is the rxr matrix with diagonal blocks T^ 1 . Thus 

T- 1 = f-^I,. - (14- t'f-HyHt'f- 1 }. (D.4) 

Now t'^ 1 = (1 +n(l - n)- 1 )?^ 1 - (1 - Tij-H'^ 1 , i = I, . . ., d, and so ff - 1 = {(1 - 
ri)- 1 ^^ 1 ,...,^ -Td) -1 ^^ 1 }, and thus t'f~ x t^a. From (D.4), the (i,j)th r 4 x 
submatrix of T~ l , for i ^ j, is -f^Hit'fi '^/(l + t'T" 1 *), which equals (D.2) on 
substituting (D.3), while for i =j it is 

T « ji^r /{1+tr} ' 

which equals (D.l) after straightforward algebra. □ 

Lemma 2. LetTa be a Cauchy matrix, having (j,k)th element (1 4- Vij + v^) -1 , and let 
the jth element of ' ti be (1 + Vij)~ 1 , where G (—77,00) \ {0}, all i,j and Vij 7^ Vik, for 
j 7^ k. Then T as defined in Lemma 1 is non-singular, and its inverse T" 1 has (i, j)th 
Ti x rj block with (k,t)th element 

{i + 2v ik ){i + 2v u) n n l+vu _ +vim 

m— 1 m— 1 

m^k m^l 

11 1 nf^) 2 } <■»> 



1 



4- v ik + vu { v ik (l + v lk )vit{l 4- vu) y Uim 



' ^ s — 1 m— 1 

(l4-2^fc)(l4-2^) -pr (1+Vjk +Vim)(l+Vim) 



n 
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m=1 ( v jm ~ Vjl)Vj m 

/{sft(^) 2+i -4 

Proof. From page 31 of Knuth [10], T^ 1 has (k,£)th element 

~[ (1 + Ujfc + V im )(l + V it + V im ) 



I < (1 + Wife + Utf ) (w 4 fc - V im ) TJ (Vii Vim ) f • 
' \ m— 1 m— 1 J 

For each z define the (r.; + 1) x (r» + 1) non-singular Cauchy matrix Tjt whose first 
rows are (X^,^) and whose last row is 1). Thus, again from page 31 of Knuth [10], 
the {ti + l,rt 4- l)th element of its inverse is (1 — n)" 1 — J|^ 1 (l Thus 

i+a = rj(i+% 1 ) 2 + i-rf- (d.7) 

£=1 

Also, the leading r» x 1 sub-vector of the (r, + l)th column of T!^ -1 is (1 — Ti)~ x T^ U, 
which has fcth clement 



(1 + V ik ) U (1 + "iA; + Uim)(l + Vim) J \ (1 + «ifc)^fc («ifc ~ «tm) JJ (-«tm) > 
m— 1 ' I. m— 1 m— 1 J 

_ 1 + 2^ fc A- (1 + « tfc + i'» m )(l +t)jj 

n t^im ^z/c )Vim 

The proof is completed by substitution and rearrangement. □ 

Lemma 3. Let T + be the (r + l) X (r+ 1) matrix whose first r rows are (T,t) and whose 
last row is (t',1), with T and t defined as in Lemmas 1 and 2. Then 



JT+-1 _ 



T~ 1 (I r + tt'T^il - t'T-H)- 1 ) -T~H(1 - t'T-H)- 1 

-(l-t'T-HyH'T- 1 {l- t'T-H)- 1 



where (1 - t'T-H)- 1 = 1 + a. 
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Proof. From (D.l) and (D.2) 



f / \ 1 1 

'' rt = §r + (i + -)(i-^)r ~I^)/IT^ + (rT^(I- 



1 + cr 



after routine algebra. Thus l—t'T l t = 1/(1 + cr), and the proof is readily completed. □ 
Lemma 4. For a > — 1 



sup 

a>a 



J 



E 



< a. 



Proof. The expression within the modulus is bounded by 



x a dx + 1 = — ^ + 1 < — 



a + 1 



a + 1 



Lemma 5. Abr a > — 1, 



sup 

a>a 



< 



A" 



Proof. The expression within the modulus is 



1 ^ T/ J 
J= 2 hi-VP 



J 



J" 



Jl+min(a,0) 



1/J 1 

x a dx + — 



1-1/ J 



Using the mean value theorem, the first term in (D.8) is bounded by 

r, J / ■ \ a— 1 I I J 

la 



iW K«>»i + ^Er'x«<»)<, 



A' 



ja+l 



The last two integrals in (D.8) are bounded by 



(i/jy +1 + i 



a + l a + 1 
Define, for s G [0, 1], w(s; a, b) = s a - s b 



Lemma 6. For a > — ^ 



sup (a 

a,6>a 



1_ 7 



< 



A 



J s+1 J' 



v (x; a, &) 2 da; < A. 



□ 



a; a dx. (D. 



□ 
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Proof. The integral is 



1 



1 



2{a-bf 



2a + l a + b+l 26+1 (2a + l)(a + b+ 1)(26+ 1) 
Lemma 7. For a> — |, 

C J 



<if(a-fc) 2 



a,6£[a,a] 



sup <^ (a - b)- 2 £ v 7; a, 6 < AV(log J) 



Proof. By the mean value theorem, 

\v(s;a,b) \ < s c \\ogs\\a -b\, s e (0,1], 
where \a — c\ < \a — b\. Also, for such c, 

s c <s^, se(0,l]. 
Thus the quantity in braces in (D.9) is bounded by 



K(\ogjfJ2 



J / . \ 2a 
J 



J 



<KJ{\ogjf 



because a, > — ^ . 



Lemma 8. For — 1 < a < a < 00, 

J 



sup \a — b\ 1 

a,fc£ [a, a] 



J^2 v (jj'> a >bj - J v(x;a,b)dx 



< 



K{\ogJ) 2 

J"l+min(a,0) ' 



Proof. The expression within the modulus is 



v[ ^; a, 6 I — v(x; a, b) *>dx+ —v[ — ; a, 6 



J 



1 / 1 



J VJ 



1/J 
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□ 



(D.9) 

(D.10) 
(D.ll) 

(D.12) 

□ 



v(x;a,b)dx. (D.13) 



— 2 J{j-i)/J 

From (D.10) and (D.ll), the last integral is bounded by 

r i/j 

K / a^|loga:|da:|a-&| < K (log J) J-^a- b\, 
Jo 

and the same bound results for the penultimate term of (D.13). By the mean value 
theorem \v(s] a, b) ~ v(s — r; a, b)| is bounded by 



\s c logs- (s-r) c log(s-r)\\a-b\, < r < 1/J, s > 2/ J, 



(D.14) 
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where \a — c| < \a — b\, and the first modulus is bounded by 

\{s c - (s - r) c }logs| + |(s - r) c {logs - log(s - r)}\ 



<s c |logs| 



1-1- 



1 - 



log 1- 



<K— |logs| 



Thus the first term of (D.13) is bounded by \a — b\ times 
= 



r- 



j 



3=1 

Lemma 9. for a > — 1, 



1(Q < 0) H j l(s = 0) H — l(s > 0) 



ja+l 



J 



(D.15) 



□ 

(D.16) 



sup <JJ|ai-&i| 

taj ,bi£[a,a] \j = i 



i=l,2 



j=li=l ^ /JO i=1 



< 



AT(log J) 3 



Jl+min(2a,0) ' 

Proof. The expression within the second modulus is 



^ / I TT^f^; *'^ - TTw(a;;aj,6j) 1 da; 

^•/(3-i)/^U=i V J / 7=1 J 

2 /l \ f 1/J 2 
i=i ^ J / Jo i=i 
Similarly to the proof of Lemma 7, the last term is bounded by 

A/J 2 R7W 7"l 2 2 

X / x^Qogx) 2 dxT\\ai-bi\< \ 2 fJ TUai-h 

J ° i=l J " i=l 

The expression in braces in (D.17) can be written 



v[ -;ai,6i -v(a;;ai,6i) U( -;a 2 ,o : 



J 



J 



+ u(x;ai,6i)<j w( — ]a 2l b 2 ) ~v(x;a 2 ,b 2 ) 



(D.17) 
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Both terms are treated similarly; we consider only the first. From the bounds (D.14), 
(D.15) its first factor is bounded by (K/J)(j/ J)-' 1 (log J)\a± — and its second one 
by K(j/J)- (log J) | a<i — 62 1 . Thus its contribution is 



whence the result follows by an analogous calculation to (D.16). 
Lemma 10. For i = 1, . . . , d and — ^ < a < a < 00, and all q>0 



e{ sup N-^Ysi-Y^ogu^Xu 

Ue[o,a] uerA" 1 ^ 



<K(\ogN) q . 



□ 



(D.18) 



Proof. By summation by parts 



u i = l 



— ) (logwo 9 ^ 



£ 

u i = l 



Ui + 1 



^2(\og£) q x Uu ..., e ,..., Ud + Y (log^) 9 ^, 

t=l u i = l 



where lu,,.,.^...,^ is x u with Ui replaced by i. Thus the expression in the modulus 
in (D.18) is 



u,=l 



j2 (-) {l-a+ury^w+n-^w 



(D.19) 



where 



u h =l 1=1 
k=l,...,d 
k=£i 



The factor in braces in (D.19) is bounded by |a|/u, < K/ui, whereas (ui/ni) a < (ui/rii)- . 
Thus the left side of (D.18) is bounded by 



KN -i,2 £ (^l_ mM +n -V* E \H i (n i )\ 



Ui = l 



n, / m 



< K(\ogn t ) q n; 1/2 -- u-~ 112 + K(\ogN) q < K(\ogN) q , 

Ui=l 
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since a > — ^ and 



£i?;( S ) 2 = E E E E7«x-« 1 ,...,«-m,...,« < ,-» < ,(Iogi)«(bgm)« 

A"iVs(logs) 2 « 



ttfc=l ^=1 Vk=l rn—1 
fc=l,.„,d fe=l,... J d 



<^£(iog S )^E w< 

7?. * » 



□ 



Lemma 11. Fora>— i 



£<J sup |a-&r 1 

^ a,6£ [a, a] 



E«KM;a^)^« 1 <i^iV 1/2 log7V. (D.20) 



Proof. By summation by parts, 

ni n^ — 1 Ui 

E v(ui/m;a, b)x u = E {^KM; «, &) - + l)/n l ; a, 6)} E 

Xu!,....l.....U d ■ 

U i = l Ui = l 1=1 

From (D.14) and (D.15), the expression in braces is bounded by 

' log rii\ f Ui + 1 



K 



K\ogN f Ui \ 3 



< 



Ui \ m 

vru-1 



Thus the left side of (D.20) is bounded by KlogNYZt=i( u i/ n i) Su i lE \ H i( u i)\> which, 
from the proof of Lemma 10 (with q = 0), has the desired bound. □ 

Lemma 12. Let a > — ^ be a scalar and a = a.j be a sequence such that a — a = Op(J~' ) ) 
as J — > oo, for some r\ > 0. Then for all q>0, 

J 

J- 1 -" E( lo g.?Tb' 5 - 3 a \ = P (J^), as J -> oo. 

i=i 

Proof. The left side is bounded by 
J 



E(iogjT' ( 7 ) |j 3 -° - 1| < \j2(\o g jy +1 ( - 



a — a 



<ifJ"/ 2 O p (J-") 7 E(7) =° P ( J_ " /2 ) 



J 



Lemma 13. For a > — ^, there is an r\ > suc/i i/iai for all sufficiently large J, 
1 



J 



a+1 (a + 1) 2 



□ 



(D.21) 
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Proof. The left side is bounded by 
J 



|(log 2; )x a -(logj)r|d a ;+ [\\ogx)x a dx 
■_ Jj-l ,J Jo 



(D.22) 



i=a-'J- 

The first modulus is bounded by 

| \ogx\\x a - f\ + | \og(x/j)\j a < KQogj){(j - If- 1 +3 a - 1 } +.f- 1 

< K(\ogj)j a - 1 

for x G [j - j > 2. Thus the hrst term of (D.23) is 0((log J) J~ a_1 ) for a < 0, 
0((log J) 2 J- 1 ) for a = 0, and 0((logJ)J- 1 ) for a > 0. The last integral is 0(J Q - 1 ). 
Since a > — 1 there is an 77 > to satisfy (D.21). □ 

Lemma 14. For any a > — \, there is an r\ > such that for all sufficiently large J , 



J z — ' \J J a + 1 (a + l) z (a- 



1) S 



<J~ r >. 



Proof. The left side is bounded by 
J 



1 J P 



x) 2 x a -(\ogj) 2 j a \dx + 



1 



J" 



(logx) x a Ax 



Thc first integrand is bounded by 

(l0gx) 2 \x a - f I + I log(!T/i)|| log(xj)|j a < Kilogjff- 1 

as in the proof of Lemma 13; the proof is completed in similar fashion. 
Lemma 15. For any a > —\ and all sufficiently large N , 



eIn- 1 ' 2 logiui/mXui/niYxul < K. 



Proof. The left side is 

""EE!*) (*) M?.MZh- 



u,v£N 



2<i 



□ 



by Assumption 3 and straightforward application of Lemmas 13 and 14. 



□ 
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Lemma 16. For 01,02 > \, 01,02 > 0, and any finite positive or negative integer M, 
there is an i] > such that for all sufficiently large J, 



7E N" * 



3=1 
< |M|J-" 

Proof. We have 



J I \ \J 



log 9 



J 



log 



'12 



j + M 
J 

J+M 
J 



< 



M fj 



log 92 ( j 



< 



3 \J 
M 
3 



By elementary inequalities the left side of (D.23) is bounded by 

KM (log J) qi+q2 
i 

3=1 

■ log J 



llOgJ£l^lV- . Ql+ a 2 -l 
Jai+a 2 + l Z^- 7 
3=1 

,fl(oi + a 2 <0) l(ai+a 2 =0) 



< A' I A/ 1 (log J) 9l+<?: 
which is 0(|M|J-''). 



J 



3 \ 3 



^1 7. 



(D.23) 



l(ai +a 2 > 0) 



□ 
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